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ABSTRACT 

Next generation galaxy surveys demand the development of massive ensembles of 
galaxy mocks to model the observables and their covariances, what is computationally 
prohibitive using A^-body simulations. COLA is a novel method designed to make this 
feasible by following an approximate dynamics but with up to 3 orders of magnitude 
speed-ups when compared to an exact A-body. In this paper we investigate the opti¬ 
mization of the code parameters in the compromise between computational cost and 
recovered accuracy in observables such as two-point clustering and halo abundance. 
We benchmark those observables with a state-of-the-art A-body run, the MICE Grand 
Challenge simulation (MICE-GC). We find that using 40 time steps linearly spaced 
since Zi ~ 20, and a force mesh resolution three times finer than that of the number 
of particles, yields a matter power spectrum within 1% for k < 1 /iMpc”^ and a halo 
mass function within 5% of those in the A-body. In turn the halo bias is accurate 
within 2% for k < 0.7hMpc“^ whereas, in redshift space, the halo monopole and 
quadrupole are within 4% for k < 0.4/iMpc“^. These results hold for a broad range 
in redshift (0 < 2 < 1) and for all halo mass bins investigated (M > 10^^'® h~^ M©). 
To bring accuracy in clustering to one percent level we study various methods that 
re-calibrate halo masses and/or velocities. We thus propose an optimized choice of 
COLA code parameters as a powerful tool to optimally exploit future galaxy surveys. 
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1 INTRODUCTION 

Present and planned galaxy surveys like DES^, LSST^, Eu¬ 
clid®, eBOSS^, DESI®, will generate a wealth of high-quality 
data that will allow to test the nature of dark-energy and 
constrain possible deviations from the standard cosmological 
model based on General Relativity (Weinberg et al. 2013). 

An optimal extraction of cosmological parameters from 
those very large and complex datasets will ultimately rely on 
our ability to model cosmological observables and their co- 
variances with high accuracy. This entails the development 
of synthetic observations based on mock catalogues pro¬ 
duced from numerical simulations. The requirement of sam¬ 
pling large cosmological volumes while still resolving small 
scales is a big challenge to current A-body simulation codes 
(Kim et al. 2011; Angulo et al. 2012; Alimi et al. 2012; Skill- 
man et al. 2014; Heitmann et al. 2014; Fosalba et al. 2015b; 


^ http: //www. darkenergysurvey. org/ 

^ http://www.lsst.org 
^ http://www.euclld-ec.org/ 

^ http://www.sdss.org/surveys/eboss/ 
® http://desi.lbl.gov/ 


for a review see Kuhlen et al. 2012). Moreover, hundreds or 
thousands of realizations are needed for robustly estimat¬ 
ing covariance matrices (crucial for cosmological parameter 
estimation, see Taylor et al. 2013; Blot et al. 2015) or for 
propagating errors in complex and non-linear analysis (e.g. 
BAO reconstruction, see Takahashi et al. 2009; Kazin et al. 
2014; Ross et al. 2015). Yet, producing massive ensembles of 
A-body mocks is computationally prohibitive and alterna¬ 
tive routes need to be devised in order to face the enormous 
challenge. 

A convenient approach to reproduce the observed 
galaxy distribution on large scales is to populate dark mat¬ 
ter haloes with galaxies using either Semi-Analytic Models 
or techniques relying on the Halo model, such as the Halo 
Occupation distribution or the Sub-halo Abundance Match¬ 
ing (see Knebe et al. 2015, for a comparison of models). 
Some of these techniques do not need to fully capture inter¬ 
nal halo substructure since they only use reliable positions 
and mass estimates for haloes, which can be predicted by 
approximate methods. A simplified and cheaper evolution 
of the density field is enough to approximately predict the 
abundance and clustering of collapsed regions to build halo 
mock catalogues. Available methods that are based on this 
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idea include: pinocchio (PINpointing Orbit-Crossing Co¬ 
lapsed Hierarchical Objects, Monaco et al. 2002; Monaco 
et al. 2013), PThaloes (Scoccimarro & Sheth 2002; Manera 
et al. 2015) and recently COLA (COmoving Lagrangian Ac¬ 
celeration, Tassev, Zaldarriaga & Eisenstein 2013; Tassev, 
Eisenstein, Wandelt & Zaldarriaga 2015). 

An alternative approach is to use prescriptions to assign 
haloes in a density field produced by a simple gravity solver. 
This is the case of the log-normal model (Coles & Jones 
1991), QPM (Quick Particle Mesh White, Tinker & McBride 
2013), PATCHY (PerturbAtion Theory Catalog generator of 
Halo and galaxY distributions Kitauraet al. 2015), EZmocks 
(effective Zel’dovich approximation mock catalogues Chuang 
et al. 2015a). Thus these methods constitute a more direct 
modeling for the galaxy distribution. The drawback is that 
they contain many internal parameters describing properties 
such as the galaxy clustering and abundance that have to 
be fit in order to correctly reproduce observations. COLA can 
be categorized as a semi-Wbody method and therefore has 
higher computational requirements than other methodolo¬ 
gies. But is more predictive and yields accurate high order 
clustering statistics. For a recent performance comparison of 
fast mock methods see Chuang et al. (2015b). 

A common feature in most of the fast methods is that 
they evolve mass particles using Lagrangian Perturbation 
Theory (LPT). COLA is unique because on top of the analyt¬ 
ical trajectory it adds a residual displacement computed by 
an Y-body solver. Equations of motion are solved in a frame 
comoving with LPT observers which, at a given perturbative 
order, encodes more non-linear growth information than the 
corresponding Eulerian approach. This guarantees an accu¬ 
rate description of the dynamics on large scales where the 
evolution is quasi-linear. The numerical evolution is simpli¬ 
fied with respect to full A^-body codes, making use of a fine 
Particle-Mesh (PM) algorithm, and evaluating forces for a 
few (i.e, order of ten) time steps, haloes can then be iden¬ 
tified running a finder in the evolved dark matter particle 
distribution, in the same way is done for an A-body. 

In COLA, the displacement field x is decomposed into 
two terms: aJupT describes the LPT trajectory and ajres is 
the residual displacement with respect to the LPT path, 

Xieaif) = X{t) - Xl^PT{t) (1) 

The equation of motion in a pure gravitational simula¬ 
tion relates the acceleration to the Newtonian potential 'E: 
dtx{t) = —V<l>(t). Using Eq. 1, it can be rewritten as 

dfxresit) = -V$(t) - dtXl,PT{t), (2) 

where E is evaluated at the position x. COLA discretizes the 
time derivatives only on the left-hand side, while uses the 
LPT expression at the right-hand side. More recently, Tas¬ 
sev et al. (2015) extended this reformulation to the Eulerian 
domain, which allow the simulation of a sub-volume embed¬ 
ded in a larger effective simulation box. 

Kazin et al. (2014) developed a parallel version of COLA, 
suitable for the massive production of mock catalogues, 
which they used for constraining the BAO signal in the Wig- 
gleZ survey. This paper is based on their implementation. 

COLA is able to accurately follow the evolution of the 
dark matter distribution on scales of few Mpcs. More chal¬ 
lenging is to reproduce the birth and growth of haloes, which 
display high density contrasts and non-linear dynamics sus¬ 


tained by virialisation. Halo formation is very sensitive to 
the degree of approximation in the dynamics at small scales 
and a minimum accuracy is indispensable to generate reli¬ 
able halo mock catalogues. Therefore, it is essential to assess 
the performance of the method under different values for the 
internal code parameters that describe the spatial and tem¬ 
poral discretization of the gravitational evolution. 

In this paper we develop a suite of large COLA simu¬ 
lations to explore the impact of varying internal code pa¬ 
rameters on basic observables such as the matter real-space 
power spectrum and the halo mass function. In particular, 
we explore the size of the force evaluation mesh, the number 
and distribution of time steps, and the initial redshift, in 
combination with mass resolution. As a reference we use the 
A-body simulation MICE-GC, which has been extensively 
validated (Fosalba et al. 2015b; Crocce et al. 2015; Fosalba 
et al. 2015a). Within this parameter space we find the con¬ 
figuration that yields the best accuracy in power spectrum 
and mass function (for a range of halo masses, comoving 
scales and redshifts) without a large increase in computa¬ 
tional cost. For the optimal parameters, we also character¬ 
ize the recovered accuracy for halo clustering in real and 
redshift space. The above procedure yields what can be re¬ 
garded as the optimal accuracy of the code on its own. To 
improve it further one needs to rely on simple corrections to 
halo masses using an external simulation. In Tassev et al. 
(2013) they already showed that the halo mass should be 
corrected and they used a constant multiplicative factor at 
all masses and redshifts. In this work, the first correction we 
explore aims at matching the halo abundance and the sec¬ 
ond the halo clustering amplitude on large scales. Thanks to 
them, deviations in halo clustering can be reduced to within 
the percent level in most situations. 

This paper is organized as follows. In Sec. 2 we describe 
the simulations used throughout this paper, as well as their 
analysis. Then we start in Sec. 3 discussing the capabili¬ 
ties and limitations of COLA when run with as few as 10 
time steps. Sec. 4 presents the code parameters exploration, 
where we show the accuracy that can be achieved using dif¬ 
ferent configurations and we give optimal parameters. The 
next two sections use runs with optimal code parameters. In 
particular. Sec. 5 shows the results for halo clustering and 
how they can be improved by re-calibrating halo masses. In 
Sec. 6 we test the performance of COLA in redshift space. We 
summarize and discuss our findings in Sec. 7. 

2 SIMULATIONS 

2.1 MICE-GC: the benchmark A-body run 

In this paper we use a full A-body run to benchmark the 
results of COLA: the MICE Grand Challenge (MICE-GC)® 
simulation. This simulation and its products have been ex¬ 
tensively validated, the dark-matter and halo outputs are de¬ 
scribed in Fosalba et al. (2015b) and Crocce et al. (2015). In 
addition, leasing maps are described in Fosalba et al. (2015a) 
while Carretero et al. (2015) and Hoffmann et al. (2015) 
detail the HOD implementation used to produce galaxies 

® More information is available at http://www.ice.cat/niice. 
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mocks and the higher-order clustering, respectively. MICE- 
GC evolved 4096® particles in a volume of (3072 h~^ Mpc)® 
using the GADGET-2 code (Springel 2005) assuming a flat 
ACDM cosmology with = 0.25, flA = 0.75, Q.b = 0.044, 
Us = 0.95, (Tg = 0.8 and h — 0.7. This results in a particle 
mass of 2.93 x 10®° Mq. The initial conditions were gen¬ 
erated at Zi = 100 using the Zel’dovich approximation and a 
linear power spectrum generated with CAMB®". We make use 
of dark matter and halo catalogues of comoving outputs at 
a = 0,0.5, 1.0 and 1.5. 

Haloes were identified using a Friends-of-Friends (FoF) 
algorithm (Davis et al. 1985) with a linking length of 0.2. 

It is known that long-lived transients from the initial 
conditions affect the abundance of massive haloes and the 
clustering towards small scales, even for high starting red- 
shifts if the Zel’dovich approximation is used (Scoccimarro 
1998; Crocce et al. 2006). We have performed a set of dedi¬ 
cated gadget-2 Wbody simulations to investigate and cor¬ 
rect such effects. This is discussed in Appendix A. Correc¬ 
tions are < 2% for 2-pt matter clustering and 2 — 5% for halo 
abundance on the regime and redshifts of interest. Hence 
whenever we compare to matter power spectra and mass 
functions in MICE-GC, we take into account such correc¬ 
tions. We find that the transient effects are below the 1 per 
cent level for halo clustering, so we consider that we can 
safely neglect the correction for those measurements. 

2.2 Parallel COLA code 

We use a parallel version of COLA developed by J. Koda for 
the WiggleZ survey (Kazin et al. 2014), and kindly made 
available to us. It includes both the generation of random 
Gaussian initial conditions using 2LPT (Crocce et al. 2006) 
and the FoF halo finder (Davis et al. 1985) running on the 
fly. Particle forces are computed using a Particle Mesh (PM) 
code with a grid that is finer than the mean inter-particle 
distance. We shall refer to this as the PMgrid factor, which 
Tassev et al. (2013) suggest to set to 3 in order to adequately 
resolve small mass haloes, meaning that in total there are 3® 
more grid cells than particles (the total number of cells in the 
grid is number particles x PM®,,;,^). At each time step, four 
fast Fourier transforms (FFT)® are needed: one to convert 
the density field to Fourier space and three for transforming 
back to real space the forces in each of the spatial dimen¬ 
sions. We have added some new capabilities to the code such 
as the generation of matter power spectra on-the-fiy and the 
parallel storage of the matter density field interpolated onto 
large mesh grids (in particular we use 1024® cells using the 
Gloud-in-Cell assignment). 

2.3 Simulation suite 

In order to investigate optimal code parameter set ups we 
implemented several COLA runs. They were performed on the 
MareNostrum supercomputer at BSG®. We used the same 
cosmological model, the same linear power spectrum and 

^ http://cainb.info 

® The code relies on the FFTW package in its MPI version for 
distributed memory parallelization (http://www.fftw.org/). 

® Barcelona Supercomputing Center (http://www.bsc.es). 


same particle mass as MICE-GG (except in cases where we 
are interested in exploring mass resolution effects). Most of 
the runs used 2048® particles in a box size of 1536 Mpc, 
that is, a factor 8 smaller volume than MICE-GC. It pro¬ 
vides nonetheless a very large cosmological volume, but for 
some particular runs of interest we have produced additional 
realizations in order to reduce sampling variance. A typical 
run uses 1024 cores, with maximum memory consumptions 
of 2.6Tb for PMgrW = 3 and takes around 40 minutes wall- 
clock time for 40 time steps. This means that the CPU time 
consumed is less than 1 khour, to be compared with the 3 
Mhours that needed MICE-GC having 8 times more parti¬ 
cles, which gives a speed-up factor between two and three 
orders of magnitude with respect to a full A^-body simula¬ 
tion with the same number of particles. 

The code parameters we have used for the runs are listed 
in Table 1. We have varied the PMgrW factor, the number of 
time steps, the time sampling and the initial redshift. Addi¬ 
tionally, we explore the effect of decreasing the mass resolu¬ 
tion by decreasing the number of particles while keeping the 
box size constant. We also reduce the box size while keeping 
the particle load constant for a better mass resolution. All 
runs use the same seed number for the generation of the ini¬ 
tial conditions (except for those which add more realizations 
to the same parameter configuration), what cancels out cos¬ 
mic variance between different simulations (but not with re¬ 
spect to MICE-GC, which uses a different box size). In what 
follows, we shall assume a default set of parameters, un¬ 
less stated otherwise: 2048® particles, Lbox = 1536 h ® Mpc, 
PMgrid = 3 and a distribution of time steps linear in the 
scale factor. 

2.4 Scaling of computational requirements 

We explain in this section how the run-time and memory 
consumption of a single COLA realization scale with code pa¬ 
rameters, so that combined with the information provided 
in the previous subsection one can extrapolate the numbers 
to other configurations. 

The initial redshift and the time sampling distribution 
have no effects at all on the computational requirements. 
The run-time is largely dominated by the computations of 
FFTs during force evaluation at each step. For our default 
configuration and 40 time steps, the code spends only 10 
per cent of the time in computations not related to the PM 
algorithm, such as the initial set up and I/O. For this reason, 
the run-time increases roughly linearly with the size of the 
FFT and the number of time steps. Such transforms scale 
as oc Cl(n logn), where n is the number of grid points, and 
since these are proportional to PM®^i,j we have that the run¬ 
time scales roughly as oc PM®,,^^ (for large numbers we can 
neglect the logn factor). 

Given a constant number of particles, the memory con¬ 
sumption depends only on the size of the force mesh. The 
allocation of memory from the PM part represents around 
60 per cent for PM^^d = 3 and it scales as PM®^ijj. 

2.5 Power spectrum and mass function 
measurements 

We determine the matter and halo power spectra interpo¬ 
lating the particles into a cubic grid of 1024® cells via a 
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^realizations 

Particle 

number 

Lhox 

PMgrid 

^steps 


Time steps 
distribution 

1 

1024^ 

1536 

3 

10 

9 

oc a 

1 

1536® 

1536 

3 

10 

9 

oc a 

1 

2048® 

1536 

2 

10 

9 

oc a 

2 

2048® 

1536 

3 

10 

9 

oc a 

1 

2048® 

1536 

3 


9 

oc a 

1 

2048® 

1536 

3 

40 

9 

oc a 

1 

2048® 

1536 

2 

40 

19 

oc a 

48 

2048® 

1536 

3 

40 

19 

oc a 

1 

2048® 

768 

3 

40 

19 

oc a 

8 

2048® 

1536 

3 

40 

39 

oc a 

1 

2048® 

1536 

3 

40 

39 

oc logo 

1 

2048® 

1536 

3 

40 

39 

oc o®'® 

1 

2048® 

1536 

3 

40 

100 

oc a 


Table 1. Code parameters used in this paper. The box sizes are in units of h~^ Mpc. We underline parameter values that are distinctive 
for a run and the highlighted row corresponds to the optimal setup, which provides the best accuracy. For the latter, each realization 
needed 1024 cores during 40 minutes and 2.6Tb of memory. In Sec. 2.4 we discuss how to extrapolate those computational requirements 
to other parameter configurations. The total number of cells in the force mesh is: number particles X 


Sample 

Mass range [log(M//i 

Ml 

12.5 - 13.0 

M2 

13.0- 13.5 

M3 

> 13.5 


Table 2. Halo mass samples used throughout this paper, defined 
by halo mass, at 2 = 0, 0.5 and 1. 

Cloud-in-Cell (CiC) assignment. We then obtain the density 
in k-space by doing a FFT and estimate the band power by 
averaging the square over the range of modes correspond¬ 
ing to a k-bin: P{k) = (|(5fe|^). The mass assignment into a 
finite size grid introduces a filtering artifact that we com¬ 
pensate by deconvolving the CiC window function, which in 
Fourier space is simply a division. We correct for aliasing 
effects due to the finite sampling as in Jing (2005). Lastly, 
monopole measurements of halo power spectrum are cor¬ 
rected for shot-noise assuming a Poisson noise. 

For haloes, we restrict the analysis to those having more 
than 100 particles, corresponding to M > 10^^ ® h~^ Mq for 
most of the runs. Halo masses are defined using the War¬ 
ren correction (Warren et al. 2006): M = m.pN{l — 
where nip is the particle mass and N the number of parti¬ 
cles. This is irrelevant for most of the runs that use the same 
mass resolution as MICE-GC, but provides better agreement 
for lower mass resolution runs. We build three halo samples 
according to the mass cuts listed in Table 2. Mass function 
measurements contain error bars estimated by Jack-knife 
re-sampling using 64 different cubic sub-volumes and only 
mass-bins whose relative error is less than 5% are shown 
(see e.g. Fig. 1). 

3 LIMITATIONS OF 10 TIME STEPS 

The COLA method is designed to use very few time-steps, so 
that a high speed-up of more than two orders of magnitude 


with respect to a full A-body is achieved (Tassev et al. 2015). 
In this section we explicitly test the accuracy, as a function 
of scale and redshifts, of the original COLA configuration of 
10 time steps with scale factor linearly distributed between 
redshift 9 and 0 and a PMgrid factor of 3. 

With only 10 time steps, the matter power spectrum is 
accurate at the 5% level to fe ~ 0.5/iMpc“^ (see Sec. 4.1). 
This allows the exploration of non-linear scales, but only to 
some limited extent. At the halo level, however, the situation 
is more complex. Figure 1 shows that the mass function is 
severely underestimated at 2 = 1, especially at large masses. 
The problem is not so visible at 2 = 0.5, where the disagree¬ 
ment is at most at the 15% level and at 2 = 0 it is within the 
5% for all masses. Likewise, we have checked that the halo 
bias is overestimated by as much as 20% at 2 = 1 and agrees 
within few percent level at lower redshifts. Both effects (un¬ 
derestimation of the mass function and overestimation of the 
halo bias) can be explained by a halo mass underestimation 
at high redshift, when the evolution has been computed with 
very few time steps. 

We suggest that these problems could come from a 
higher difficulty of achieving relaxation inside haloes when 
the time discretization is too coarse. Particles evolve accord¬ 
ing to the mean gravitational potential that arises from the 
smooth distribution, but are also affected by individual en¬ 
counters. The relaxation time is related to the moment when 
the latter start to significantly contribute to the dynamics, 
boosting the re-distribution of kinetic energy and achiev¬ 
ing a dynamical equilibrium in the system (Dehnen & Read 
2011 ). 

Each time step is a chance for particles to interact 
with each other, but if we reduce them drastically the re¬ 
distribution of energy is unphysical suppressed. This is crit¬ 
ical for those haloes that have not relaxed yet. Since the 
relaxation time is proportional to the number of particles of 
a halo (Binney & Tremaine 1987), the effect is larger for high 
mass haloes. The halo formation time increases and merg- 
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Figure 1. Mass function when using COLA with 10 time steps 
starting at Zi = 9. Solid, dashed and dotted lines correspond to 
2 = 0, 0.5 and 1 respectively. At high redshift there is a large 
discrepancy that is partially solved at 2 = 0.5. 


ing processes are poorly captured, producing halo masses 
largely underestimated for 2 > 0, before 10 time steps have 
been completed. Note that full Ai-body codes with adap¬ 
tive time steps schemes trigger finer time samplings at high 
density regions and halo formation is properly tracked. 

To visually confirm this idea, we show in Fig. 2 the halo 
distribution of two runs with 10 and 40 time steps in red 
open and blue hlled circles, respectively. The initial condi¬ 
tions and the rest of parameters are kept the same, so that 
differences are due only to the number of time steps. In the 
left panel, we confirm that at 2 = 1.0 massive haloes are 
in general not properly tracked with 10 time steps and they 
seem to appear fragmented as smaller mass haloes. And not 
all of the low mass haloes are identified. Nevertheless, in 
the right panel at 2 = 0, when 10 time steps have been 
completed, the agreement is much better on halo masses, 
positions and abundance at all masses. This visual inspec¬ 
tion suggests that one needs > 10 time steps before halo 
properties converge at the redshift of interest. 

We find that relaxation effects when only 10 time steps 
are used can be reduced using a higher particle mass (e.g. 
above 10^^ h~^ M©). The number of particles of the haloes 
decreases and, therefore, their relaxation time as well. We 
checked that disagreements in the mass function and halo 
clustering are indeed lower, but this apparent improvement 
is however lost for other statistics where mass resolution is 
important. 

Evolving particles with just ten time steps before the 
redshift of interest, therefore, provides accurate results only 
at large scales (k < 0.3/iMpc“^) and low redshifts. We 
might have stronger requirements that clearly push to go 
beyond 10 time steps to surpass those limitations. 


4 OPTIMIZATION 

A gravity solver algorithm discretizes both temporal and 
spatial dimensions (and the mass as well) in order to nu¬ 
merically solve for the dynamical evolution. The idea behind 
COLA is to reduce numerical computations as much as pos¬ 
sible while still capturing the growth of structure on large 
scales. This can be controlled with few internal code param¬ 
eters: the number of time steps, the time sampling distri¬ 


bution, the initial redshift and the size of the force mesh 
grid, in combination with the mass resolution and/or the 
box size. Note however that COLA is not fundamentally dif¬ 
ferent from a full A-body in the sense that as one increases 
the requirements on such parameters the numerical integra¬ 
tion of particle trajectories becomes more accurate and COLA 
would eventually converge to a full A-body. 

In this section we explore the code parameter space in 
order to understand their impact on observables and deter¬ 
mine regions that provide optimal results in terms of ac¬ 
curacy versus running time (or memory usage). We assess 
the performance by comparing the COLA dark matter power 
spectra up to A: ~ l/iMpc”^, and halo mass functions for 
M > 10^^ ® h~^ Mq, against those in the reference A-body 
run. A key difference from previous works (Kazin et al. 2014; 
Hewlett et al. 2015b; Leclercq et al. 2015) is that we aim at 
reproducing those observables simultaneously across a large 
redshift range (0 < 2 < 1). As we will see next, the require¬ 
ments in this case are more stringent that those needed for 
a single redshift output or halo mass bin. 


4.1 Number of time steps 

The first parameter we vary is the number of time steps, 
with the initial redshift fixed to 9. The upper panel of Fig. 
3 displays the 2 = 0 matter power spectrum for COLA runs 
with increasing number of time steps divided by the one 
measured in MICE-GC. The characteristic scale at which 
the ratio deviates from unity is progressively shifted towards 
higher wave-numbers as more time steps are included: there 
is a 2 per cent agreement up to scales of fc ~ 0.4, 0.6 and 
O.S/iMpc”^ for 10, 20 and 40 time steps respectively. This 
can still be improved adjusting the initial redshift according 
to the number of time steps (see Sec. 4.3). In particular, we 
check that doubling the number of time steps almost doubles 
the characteristic wavenumber where the power spectrum is 
significantly underestimated. This in turn means that there 
is room for higher accuracies with more than 40 time steps, 
although presumably the force mesh resolution would then 
soon become a limiting factor. We also find that these re¬ 
sults for the matter power spectrum are to a good extent 
independent of the redshift analyzed. 

However, for the mass function there is a higher sensitiv¬ 
ity to the number of time steps at high redshift. As shown 
in the lower panel of Fig. 3, the large underestimation of 
the mass function at 2 = 1 is solved by doubling the total 
number of time steps. With 20 time steps, 10 of them are 
computed before the redshift of interest. The abundance at 
high masses further increases by ~ 5 per cent when moving 
from 20 to 40 time steps and the mass function is consistent 
with the one from MICE-GC. 

At redshift 0 and 0.5 the mass function also in¬ 
creases with the number of time steps at masses above 
~ although more moderately. Moving from 

10 to 20 time steps, the mass function augments by 5 — 10% 
at the high mass range and from 20 to 40 by < 5%. At low 
masses, differences remain within 1 per cent for 20 and 40 
time steps. We conclude, therefore, that the low mass regime 
of the mass function converges for 20 time steps but that 40 
are necessary for the most massive haloes. In Appendix B we 
show that, even with as many steps as 20 or 40, the 2LPT 
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Figure 2. Spatial distribution of haloes in two different COLA runs that differ only in the number of time steps: 10 are represented by 
open circles and 40 by filled ones. Different panels show redshifts 1.0, 0.5 and 0 from left to right. The slices have a width of 50 h~^ Mpc 
and are 25 h~^ Mpc thick. The radii of circles are proportional to and match the r 2 oo values, so that they reflect the physical size 

of haloes. Only those with more than 200 particles are shown (which corresponds to 6 X 10^^ h~^ Mq). The largest halo a.t z = 0 has a 
mass of 1.87 X 10^® h~^ Mq for a run with 40 time steps and a 4.5% less for 10 time steps. At z = 1 the matching between the runs is 
poor, with the abundance under-estimated by 50% or more with 10 time-steps. The agreement improves as one approaches z = 0. 


contribution in the COLA method is still key to achieve ac¬ 
curate results, as compared to PM only simulations. 

4.2 Time sampling distribution 

The scale factor a is the variable used to discretize the tem¬ 
poral axis in regular time steps. For that, we can choose a 
time sampling function /(a) and distribute n steps in inter¬ 
vals of constant width A/(a), 

A/(a) = /(«/^ - ^ (3) 

where fli is the initial scale factor and a/ = 1 the final. 
If the resulting Aa << 1 then Aa ~ [/'(a)]“^A/(a). For 
the linear case, we simply have /(a) = a and the step width 
Aa is constant. We can define the step density as the inverse 
of the step width: p = i/ao. Since Af{a) is constant, then 
poc /'(a). 

In Sec. 4.1 we showed for the linear case how increas¬ 
ing the step density (which in that case is only set by the 
number of time-steps) improves the accuracy of the simu¬ 
lation, but at the expense of a higher computational cost. 
In this section we explore which function /, and the corre¬ 
sponding step density p oc produce a step distribution 

that is balanced in terms of accumulated errors over time. 
In Fig. 4 we show the step density for four different choices 
of the time sampling function. We distribute 40 time steps 
between Zi = 39 and 2 = 0 using a linear (circles), logarith¬ 
mic (squares) or power law function f{a) = (diamonds 
and triangles for p = 0.5 and p = 0.8 respectively). 

A logarithmic time sampling is useful in full A-body 
codes for global time steps that affect all particles. But 
these algorithms in general implement adaptive time step¬ 
ping schemes for individual particles that sample more accu¬ 
rately the time evolution when non-linearities start to grow. 
In the implementation of COLA we are using there is no such 
refinement at late times and we see in Fig. 4 that the loga¬ 


rithmic choice oversample the early evolution of the particle 
distribution at the price of a low step density at later times. 
On the other hand, the linear case has large relative varia¬ 
tions on the scale factor during the first steps, which might 
presumably lead to larger inaccuracies. For that reason we 
have considered the power law function to sample interme¬ 
diate situations if 0 < p < 1. 

In Fig. 5 we compare the matter power spectrum for 
three time sampling functions: linear, a°'® and logarithmic 
in solid, dashed and dotted lines respectively. Upper and 
lower panels correspond to redshifts 0 and 1 respectively. 
We see that any gain we might have at high redshift by con¬ 
centrating there more time steps is lost as soon as the step 
density decreases later. This is evident for the logarithmic 
case, which has the lowest step density for 2 < 2. In partic¬ 
ular, at 2 = 0 the power spectrum is close to that for the 
case of 10 time steps linearly distributed (see dotted line in 
the upper panel of Fig. 3) and indeed they have a similar 
step density. The distribution of time steps using the func¬ 
tion a° ® provides better results at high redshift, but falls 
below the linear distribution for lower redshifts where it has 
a lower step density. 

Therefore, an optimal distribution should have a step 
density without strong variations and we conclude from the 
measurements that the linear case offers the best global per¬ 
formance at all redshifts. Although large relative variations 
on the scale factor during the first steps could lead to inac¬ 
curacies, this is balanced by the fact that at early times the 
dynamics is close to linear and can be well approximated by 
the 2LPT evolution. Hence a better time sampling at the 
beginning is not as critical as for low- 2 . Since all these ar¬ 
guments are built based on relative variations of the step 
density across time, the conclusions are independent of the 
absolute number of time steps and the initial redshift. There¬ 
fore in the rest of the paper we shall adopt a linear time¬ 
stepping distribution. 

Lastly, we can use the concept of step density to frame 
the results from Sec. 4.1 for the abundance of haloes. We 
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Figure 3. Matter power spectrum in real space at 2 = 0 (up¬ 
per panel) and mass function at 2 : = 1.0 (lower panel) for three 
different choices for the number of time steps: 10, 20 and 40 in 
dotted, dashed and solid lines, respectively. The initial redshift is 
fixed at Zi = 9. An increase in the number of time steps directly 
translates into a better accuracy at small scales. As for haloes, 
a high number of time steps is necessary to correctly predict the 
mass function at high redshift. 


find that whenever the step density is low (/9<20), the 
mass function suffers an underestimation for masses above 



Figure 4. Step density (or the inverse of the step width) as a 
function of the scale factor for different schemes of time sampling: 
linear (circles), logarithmic (squares) and using a power of the 
scale factor, where for the latter we show two cases with exponents 
0.5 (diamonds) and 0.8 (triangles). In all cases we distribute 40 
steps between Zi = 39 and 2 = 0 and the markers are located at 
the position of each step. 



4.3 Initial redshift 

The optimal selection of the initial redshift is coupled with 
the number of time steps (but it is independent of the time 
sampling distribution, as we have already shown). 

A first guess we can do is to set the initial scale factor 
equal to the step width, which for 10 time steps gives 2 i = 9 
(Tassev et al. 2013). Starting later would introduce more 
transient effects whereas doing it earlier would produce large 
relative variations in the scale factor in the first time step, 
which seems not optimal. So there is not much room for 
optimization using few time steps. Instead we now focus on 
the situation in which we have 40 time steps. 

Using the same rule as before we can estimate a good 
guess as Zi = 39. In Fig. 6 we show the resulting matter 


Figure 5. Matter power spectrum for the following time sam¬ 
pling functions: linear (solid line), a®'® (dashed) and logarithmic 
(dotted). All runs contain 40 time steps starting at Zi = 39. The 
upper and lower panels show redshifts 0 and 1, respectively. The 
case that performs better at all redshifts is the linear one. 

power spectrum at 2 = 0 when the initial redshift is varied 
from 9 to 100. A low value of 2 i = 9 yields transient effects at 
all wave-numbers with an amplitude up to one per cent. The 
rest of cases are almost indistinguishable, only for Zi = 100 
there is slightly less power at fc ~ l/iMpc“^. On the other 
hand, we detect that the mass function is underestimated 
at low masses if the initial redshift is too high. One possi¬ 
ble explanation is that for high starting redshift the density 
contrast in the initial conditions are too smooth and then, 
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Figure 6. Matter power spectrum at z = 0 as a function of the 
initial redshift. All runs distribute 40 time steps linearly along 
the scale factor. A too low starting redshift produces transients 
at non-linear scales. The cases with Zi > 19 are almost indis¬ 
tinguishable from each other, except for a slightly less power for 
Zi = 100 at fc ~ lhMpc“^ (see text for details). 

due to the coarse time sampling in COLA, the smallest density 
peaks that seed small mass haloes are blurred. This pushes 
towards using an initial redshift not as high so fluctuations 
are larger, even if they are given by 2nd order in LPT. 

Nonetheless the dependence we observe on this param¬ 
eter once Zi ~ 20 — 40 is weak, since in practice only affects 
the position of the first(s) time step. But given the slightly 
better performance on the mass function of Zi = 19 when us¬ 
ing 40 time steps, we adopt this value as our fiducial choice 
in what follows. 

4.4 Force mesh grid size 

Previous subsections were devoted to parameters that de¬ 
fine the temporal discretization of the simulation. We use 
now the most accurate configuration (that is, 40 time steps 
linearly distributed from Zi = 19) to study the effects of the 
spatial discretization in the force computations. In partic¬ 
ular, we compare runs with factors of 2 and 3. We 

note that this parameter is of particular relevance as it has 
a large impact in the computational cost of the runs. For ex¬ 
ample, PMgrid = 2 allows a saving of 70% of the computing 
time and 30% of the memory consumption with respect to 
PMgrid = 3. 

We find that changing from PMgrid = 3 to 2 

only changes the matter power spectrum by ~ 1% for 
fc < 1 fiMpc”^. However there is a more important effect on 
the halo mass function. For PMgrid = 3, what corresponds 
to a comoving cell size of 0.25Mpc given our box-size 
and particle load, we recover a mass function within 5% 
of the one measured in MICE-GC down to 10^^'®M©. 
This is shown in Fig. 7 with a solid blue line. According to 
the halo model this mass scale corresponds to a halo size of 
~ 2 h~^ Mpc (e.g. Scoccimarro et al. 2001). This means that 
in order to resolve a given halo mass scale at such accuracy 
we need a minimum of roughly 8 cells to sample its typical 
halo size. Otherwise the haloes are puffy and might not col¬ 
lapse and be resolved. For PMgrid = 2 the force evaluation 



Figure 7. Mass function at z = 0.5 using a PMgrid = 2 and 3 
(dotted and solid lines, respectively). Both runs contain 40 time 
steps and have Zi = 19. Decreasing the size of the PM grid pro¬ 
duces a larger overestimation of masses for large haloes and in¬ 
creases the incompleteness for small haloes. 

cell size is ~ 0.375 h Mpc“^, given the above scaling it means 
that we should only resolve the mass function at ~ 5% for 
haloes more massive than ~ 10^^ h~^ Mq. This is in fact in 
good agreement with our findings in Fig. 7. 

A discrepancy in the mass function might have two 
sources: a genuine difference on the abundance or that halo 
mass estimates are systematically biased. The first case, and 
assuming that the difference is spatially homogeneous, does 
not produce differences in clustering for samples selected by 
mass cuts (see Table 2), while the second does. Since we do 
not detect any significant difference in halo clustering at the 
low mass range for different PMgrid factors, we infer that 
there is a completeness problem at those masses due to the 
size of the force mesh. Not all haloes that should form are 
detected in the simulation, and in a mass-dependent way. 

At high masses, on the contrary, we observe a lower clus¬ 
tering amplitude (~ 1 per cent at linear scales) for the run 
that produces a higher overestimation on the mass function. 
Both facts can be explained by a halo mass overestimation. 
One possible interpretation is that the puffier the haloes 
due to the force resolution, more easily the FoF algorithm 
bridges neighboring particles or small groups to a halo that 
really do not belong to it, hence systematically biasing high 
the mass estimate, as we observe. 

4.5 Optimal setup 

So far, we have given an exploration of the main code pa¬ 
rameters in COLA and their impact on the dark matter clus¬ 
tering and on halo abundance. To achieve percent accuracy 
on both quantities, at very least 10 time steps have to be 
done before the redshift of interest^°, which means that in 
total we might need 20 or more until z = 0. The more we do, 
the higher is the wavenumber where the dark matter power 
spectrum starts to miss power. This is true to at least ~ 40 
time steps, above that one should probably set PMgj-i^ > 3 
(for the reference mass resolution we use, 2.9 x 10^° h~^ M©), 

Using a particle mass of 2.9 X 10^'^ h~^ M©. 
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Figure 8. Comparison of the matter power spectrum in real space 
for the run using optimal parameters (40 time steps and Zi = 19, 
solid lines) and the default configuration (10 time steps and Zi = 
9, dotted lines). We show redshifts 0, 0.5 and 1. This optimal 
setup delivers a ~ 1 per cent accuracy at scales fc ~ 1 /iMpc“^. 

so that the force resolution does not limit the accurate sam¬ 
pling of power up to A: ~ l/iMpc“^. In Appendix B we 
show that still with these configurations, the COLA method 
yields better results than a PM only evolution. After we 
find that the linear time sampling distribution is the opti¬ 
mal one, regardless of the rest of parameters, and that for a 
large number of time steps best results are already achieved 
with an initial redshift of 19. A high PMgrid factor is re¬ 
quired for percent accuracy in halo abundance and matter 
clustering and thus we set PMgrw = 3 despite its relatively 
higher computational cost. The prize of further increasing it 
is not well justified in terms of additional accuracy. 

A good choice of code parameters depends on which 
accuracy requirements need to be accomplished by the final 
mock catalogues. In this work, our target is to achieve per 
cent level accuracy on the matter power spectrum and halo 
abundance, in the wide ranges of 10^^'® — 10^®'° h~^ Mq in 
mass, scales up to fc ~ l/iMpc“^ and redshifts comprised 
between 0 and 1.5. 

Given these requirements we find that the best setup 
is set by 40 time steps linearly distributed along the scale 
factor, starting at Zi = 19 and with a PMg^id factor of 3. 
Fig. 8 shows the matter power spectrum using this configu¬ 
ration (solid lines) compared with the case of 10 time steps 
starting at = 9 (dotted lines) at redshifts 0, 0.5 and 1. 
At 2 = 0 (1), there is a 1 per cent accuracy up to fc = 0.8 
(1.3) hMpc“^. Regarding the mass function, the solid line 
in Fig. 7 depicts results at 2 = 0.5 for the optimal setup. 
For other redshifts the conclusions are quite robust, i.e., a 5 
per cent underestimation at M = 10^^ ® h~^ Mq and a ~ 5 
per cent excess for M > 10^®'® h~^ M©. 


5 HALO CLUSTERING 

In Sec. 4 we found an optimal configuration set-up for COLA 
by benchmarking the matter clustering and the halo abun¬ 
dance as a function of redshift against those measured in 
MICE-GC. We now study what that configuration implies 


for the clustering of haloes. The first row in Fig. 9 shows 
the halo-matter cross power spectrum in real space and 
without applying any correction to the catalogues. Differ¬ 
ent columns separate mass samples Ml, M2 and M3 (see 
Table 2) from left to right. Solid, dashed, dot-dashed and 
dotted lines display redshifts 0, 0.5, 1 and 1.5 respectively. 
The other two rows are explained in the following subsec¬ 
tions 5.1 and 5.2. We notice that there is a general ~ 2 per 
cent under-estimation of the clustering amplitude at all mass 
bins and redshifts. This constitutes a remarkable result: it is 
possible to predict the halo linear bias with an accuracy of 
~ 2 per cent without doing any correction nor the necessity 
of calibrating against a reference full A-body simulation. 

Note, however, that we have also found evidence that 
halo masses are biased. Hence, we now explore two different 
corrections on the mass with the aim of reducing further the 
deviations in the halo bias to the < 1% level. One is based 
on fitting abundances (abundance matching. Sec. 5.1) and 
the other on fitting clustering (bias re-calibration. Sec. 5.2). 

5.1 Abundance Matching 

The cumulative halo mass function gives a monotonic re¬ 
lationship between the mass and the abundance of haloes. 
Biased halo mass estimates makes this function in COLA to 
have deviations with respect to a reference A-body simula¬ 
tion. If we have an external fiducial mass function (coming 
from a full A-body simulation for instance), we can re-assign 
the halo masses in the catalogue so that the reference abun¬ 
dance is fitted. When the incompleteness is negligible, we ex¬ 
pect this calibration to greatly reduce disagreements among 
both catalogues if the ranking of halo masses has the correct 
ordering. If the incompleteness is present, there are missing 
entries in the catalogue and trying to match abundances will 
not produce the desired effect but a mixing of haloes with 
different clustering properties. 

The second row in Fig. 9 shows the halo bias after cor¬ 
recting halo masses by abundance matching, using the mea¬ 
sured mass function in MICE-GG as reference. The small 
disagreements in the top panels are greatly corrected in mass 
samples M2 at 2 > 0 and M3 at all redshifts, but not in 
Ml. This is consistent with the impact of incompleteness 
in the mass sample described above: abundance matching 
works well as long as the incompleteness is not present, i.e. 
M > 10®® Mq (see the solid line in Fig. 7). 

We have tested as well the capabilities of the abun¬ 
dance matching for runs using only 10 time steps, in which 
the “uncorrected” mass function is highly under-estimated 
at 2 = 1 and the halo bias deviates by ~ 20 per cent (see 
Fig. 1). After abundance matching, the bias is recovered at 
the 3 per cent level for all mass bins and redshifts, but only 
for k < 0.5hMpc“®, what illustrates that mass calibration 
performs worse when non-optimal parameters are used in 
COLA. 


5.2 Halo bias re-calibration 

We now explore an alternative mass re-calibration that is 
targeted to fit the halo bias. Note in the first row of Fig. 9 
the COLA run (with optimal set-up) yields always a residual 
bias mismatch of 2 per cent, regardless of the mass sample 
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Figure 9. Halo-matter cross power spectrum. Rows correspond to different halo mass corrections: original mass, after abundance 
matching (see Sec. 5.1) and after bias re-calibration (see Sec. 5.2) from top to bottom. Columns separate halo samples Ml, M2 and M3 
from left to right. The original catalogue has a bias underestimation of ~ 2 per cent in all cases. The abundance matching performs well 
at high masses while the bias re-calibration is able to achieve a 1 % agreement on large scales (i.e, k < 0.3 — 0.4hMpc“'^) and 2 % at 
intermediate scales (0.4 < k < 0.7/rMpc“^). 


and redshift. We can use this fact to build an alternative 
correction independent of any parent simulation (assuming 
the 2% factor is roughly independent of cosmology). In the 
framework of the halo model (Cooray & Sheth 2002) halo 
bias and halo mass are related through a function that only 
depends on cosmology b = b{M). Thus, to first order, a 
fractional reduction in the bias of 51nfe can be recovered 
with a shift in halo mass In M —>■ In M — <5 In M given by, 

In what follows we set dlnfc = 0.02 (the bias calibration 
value we found) and evaluate the derivative in Eq. 4 at the 
corresponding mass and redshift using the bias prediction 
from Sheth & Tormen (1999) but we have checked that other 
fitting functions provide similar results. 

The recovered halo bias values after doing such mass 
re-calibration are shown in the third row in Fig. 9. Now 
the agreement with MICE-GC is within 1% up to scales 
k < 0.5/iMpc“^ for all redshifts and masses. However, the 
correction is not working perfectly for the mass sample M3, 


where it is sub-percent up to fc < 0.3/iMpc“^ but yields 
an over-estimate of ~ 2% beyond. We believe this could 
be due to the limited accuracy of the bias predictions com¬ 
ing from the theory of the peak background split (Manera 
et al. 2010), used to evaluate Eq. (4). Provided with a bet¬ 
ter bias prescription (or maybe the bias-mass relation mea¬ 
sured from a reference N^-body itself) one would expect the 
bias re-calibration to give very good results by construction. 
Nonetheless the accuracy remains within 1% for most cases 
and deviations from that are small. 


As was the case for the abundance, this correction solves 
disagreements due to biased halo masses but not due to 
incompleteness. The over-abundance at high masses is re¬ 
moved but the underestimation at low masses persists and 
even increases. Despite that, haloes have the right clustering 
amplitude. 
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6 REDSHIFT SPACE 

We now turn into discussing the performance of COLA for 
reproducing observables in redshift space, as this is what 
is actually observed in large scale structure galaxy surveys. 
Redshift space positions s are obtained by, 


where r is the position in real space, Vr is the peculiar veloc¬ 
ity along the line of sight direction, a is the scale factor and 
H = a~^da/dt the Hubble expansion rate. For concreteness 
we will focus in halo power spectrum multipoles and assume 
the plane-parallel approximation, that is, fixing the line of 
sight to one of the three Cartesian axes. 

In order to reduce the statistical errors in higher order 
multipoles we have produced 48 COLA runs using the optimal 
setup described in Sec. 4.5. We split the halo catalogue in 
the 3 mass bins as in Table 2, with halo masses re-calibrated 
using the bias method with 51n6 = 0.02 (see Sec. 5.2). 

Figure 10 shows the mean of the monopole (1=0) and the 
quadrupole (1=2) over the suite of COLA runs divided by the 
corresponding quantities measured in MICE-GC. Rows sep¬ 
arate redshifts 0,0.5 and 1 from top to bottom and columns 
mass bins Ml, M2 and M3 from left to right. 

For reference we recall the large-scale limit expressions 
for these quantities (Kaiser 1987) assuming a simple linear 
bias model, 

PUMk)= iC + lbf + lf)p^^{k), 

PUMO = ilbf + If) PLfk), (6) 

where h is the bias and / = the linear growth rate. 

At large scales {k < 0.3/iMpc“^), the agreement with 
MICE-GC is within 1 per cent for the monopole. Recall that 
we are using bias-recalibrated masses what ensures that the 
halo clustering is well reproduced in real space. And this con¬ 
tribution is the leading order for the monopole in redshift 
space, on large scales (i.e. the CPIam term in Eq. 6). Had 
we used the actual COLA halo masses instead we would have 
obtained biases off by 2 per cent and the monopole underes¬ 
timated by at least ~ 4%. In turn, the quadrupole in Fig. 10 
is systematically overestimated by ~ 2 per cent across all 
mass bins and redshifts {k < 0.3/iMpc“^). On large scales, 
the leading order contribution to the quadrupole is the cross¬ 
correlation between halo densities and halo velocities, i.e. the 
term in Eq. 6. This means that any inaccuracies in 

reproducing the velocity field by COLA will have a direct im¬ 
pact in the quadrupole. For instance we have checked that 
the differences on large-scales can be corrected by reduc¬ 
ing by 2 per cent each halo velocity (what would amount 
to reduce the overall bulk flow). We over plot (without er¬ 
ror bars) the monopole and the quadrupole with dotted and 
short dashed lines respectively after applying such velocity 
correction. As expected, the quadrupole is now perfectly in 
agreement at large scales and the monopole remains almost 
unaltered. 

At smaller scales {k > 0.3hMpc“^) we observe larger 
discrepancies. The monopole is underestimated, specially at 
high masses, and the quadrupole is overestimated. We be¬ 


lieve this is due to the details of the full velocity PDF^^ but 
we do not attempt to tune the results further to those of 
MICE-GG as i) the results on these scales will eventually 
depend on the galaxy sample under consideration and ii) 
these are scales that start to be smaller than those used in 
standard large-scale structure probes such as BAG. For in¬ 
stance, small-scale corrections can be postponed to a later 
stage when haloes are populated with galaxies using an HOD 
prescription, in which the velocity dispersion can be fitted 
to have agreement with observations. For reference, we have 
checked that adding a dispersion component to the halo ve¬ 
locities drawn from a Gaussian distribution with a width 
of ~ 35kms“^ and zero mean reduces the quadrupole for 
k > 0.3/iMpc“^ and is then in agreement within 2 per cent 
for most scales, mass samples and redshifts, whereas the 
monopole is not substantially affected. 

Figure 11 shows the equivalent of Fig. 10 but for the 
hexadecapole (I = 4). We find that our optimal configu¬ 
ration for COLA yields an excess of ~ 50 per cent at large 
scales, while for k > 0.2hMpc“^ the agreement is signifi¬ 
cantly improved, down to ~ 20 per cent. These differences 
are not significantly changed when the velocity correction of 
2 per cent is applied. If we further add an ad-hoc velocity 
dispersion term, as discussed above for lower multipoles, we 
achieve an agreement within 10 per cent at small scales. 


7 CONCLUSIONS 

COLA (Tassev et al. 2013, 2015; Hewlett et al. 2015a; Koda 
et al. 2015) is a method that represents a step forward in 
the production of fast and accurate mock catalogues for 
galaxy surveys. It allows speed-ups of almost three orders 
of magnitude with respect to full accuracy A-body simula¬ 
tions (depending on the particular choice of code parameters 
used) while still reproducing their large-scale structure ob¬ 
servables to within few percent. COLA is based on a combined 
scheme where particle trajectories are evolved according to 
LPT with residual displacements to those trajectories com¬ 
puted by a high resolution PM A-body solver. 

It should be noted that the COLA algorithm can get arbi¬ 
trarily close to a full A-body by demanding more accuracy 
in the time and space integration, at the expenses of in¬ 
creasing the computational cost. The key is then to find the 
optimal set-up in the code internal parameter space to bal¬ 
ance the highest accuracy in reproducing observables while 
maintaining low computational requirements. 

In this paper we have determined the optimal configura¬ 
tion of COLA , that we shall name ICE-GOLA for reference, 
by systematically exploring the code parameter space. This 
optimization uses as target observables the two-point clus¬ 
tering of matter and the abundance of haloes across a wide 
range in halo mass and cosmic time. For this purpose, we 
have developed a suite of 2048® particles COLA simulations 
that sample the internal code parameter space and compared 

For example, we have measured the halo 1-dimensional veloc¬ 
ity distribution and found that the fraction of haloes with center 
of mass velocities larger than 500 km/s is slightly underestimated 
by few percent in COLA (the exact number varies for mass samples 
and redshift), although the halo velocity rms agrees within 1 per 
cent with MICE-GC. 
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Figure 10. Monopole {I = 0) and quadrupole (I = 2) of the halo power spectrum in redshift space (solid and dashed lines respectively) in 
COLA vs. the MICE-GC Al-body simulation. Different rows correspond to redshifts 0, 0.5 and 1 from top to bottom and columns separate 
mass samples from left to right (halo masses have been corrected by the bias calibration method). Monopoles have been corrected for 
shot-noise. Measurements in COLA correspond to the mean over 48 runs using the optimal setup. At large scales (fe < 0.3/iMpc“^) the 
agreement is within 1 per cent for the monopole and 2% — 3% for the quadrupole. Dotted and short-dashed lines are the monopole and 
quadrupole after reducing halo velocities by 2 per cent, what brings the latter to better agreement while leaving the former unchanged. 


such observables to a reference state-of-the-art Af-body sim¬ 
ulation, the MICE-GC. For the optimal set-up we then stud¬ 
ied the halo clustering in real and redshift space. The most 
relevant parameters were the number of time steps of inte¬ 
gration before the desired redshift output and the spatial 
resolution of the force computation. 

Number of time steps. It has a direct impact in the min¬ 
imum well resolved scale. Integrating 40 time steps yields 
a matter power spectrum with one percent accuracy up to 
O.ShMpc”^ — 1.3/iMpc“^ in the range 0 < 2 < 1. This is 
a large improvement over the standard 10 time steps that 
yields such accuracy only for k < 0.3/iMpc“^. In turn, halo 
masses (and mass function at a given mass) are largely 
underestimated if less than ~ 10 time steps have passed 
by, especially for massive haloes (M > 10^®'®M©). 
Increasing to 40 time steps we find the mass function to 
agree with MICE-GC to within 5% for M > 10^^'® h~^ M© 
and 0 < 2 < 1 (assuming PMgrW = 3, see below). There¬ 
fore, for an accurate prediction of matter clustering and 
halo abundance across cosmic time, one needs to increase 


the default 10 time-steps by a factor of a few. Above 
40, however, further gains might be limited by the force 
resolution. We show in Appendix B that, at least up to 40 
time steps, the COLA method is still preferred over a PM 
only simulation. 

Force mesh grid size. We find that in COLA there is an 
incompleteness in abundance at low halo masses (at < 300 
particles per halo, that corresponds to M < 10^® h~^ M© for 
our reference mass resolution). This systematic effect can be 
mitigated by increasing the force resolution. For our optimal 
set-up with PMgrid = 3 we find the incompleteness at the 
5% level, see Fig. 7. Using a coarser grid of PMgrid = 2 re¬ 
duces significantly the computational cost but increases the 
incompleteness by a factor of two in the same mass range. 
In turn, the matter power spectrum is almost unaffected for 


Recall that we denote the grid size by PMg^^d and its value is 
given in units of the number of particles to the one third. 
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Figure 11. Mean halo power spectrum hexadecapole in COLA (after the correction in the halo masses by the bias re-calibration method 
with 5b/b = 0.02) compared to the one in MICE-GC. Different rows correspond to redshifts 0, 0.5 and 1 from top to bottom and columns 
separate mass samples Ml, M2 and M3 from left to right. For k > 0.2/iMpc~^ the agreement with the W^-body is at the < 20% level 
across redshifts and mass bins. The dotted line includes as well a reduction in halo velocities of 2 per cent (see text for details). 


fc < 1 ft Mpc ^. For an optimal balance of good accuracy and 
computational cost, we encourage using PMgrid = 3. 

In addition to the above, we have also explored the time 
sampling distribution and found that a linear distribution 
along the scale factor gives the best global performance be¬ 
tween redshifts 0 and 1 (see Fig. 5) regardless of the number 
of steps and the initial redshift. Lastly, transients effects 
from the initial redshift were found negligible once Zi > 19 
within COLA accuracy. Starting earlier had no clear benefits. 

Such an optimal set-up yielded halo bias within 2% to 
that in MICE-GC up to fc ~ 0.7 ZiMpc”^. In redshift space, 
monopole and quadrupole agree to < 4% to those in the 
A^-body {k < 0.4/iMpc“^). These conclusions hold for all 
redshifts and mass bins investigated, and do not rely on any 
re-calibration against an external A^-body, as it is usually 
done for COLA (Kazin et al. 2014; Ross et al. 2015; Howlett 
et al. 2015b; Leclercq et al. 2015). Note that other meth¬ 
ods for producing mock catalogues also depend to a certain 
degree on full N^-body simulations for calibrations (Monaco 
et al. 2002, 2013; Scoccimarro & Sheth 2002; Manera et al. 


2015; Coles & Jones 1991; White et al. 2013; Kitaura et al. 
2015; Chuang et al. 2015a). 

We have further improved the accuracy of COLA by in¬ 
vestigating two particular recalibration schemes, using the 
MICE-GC as a reference: 

Abundance matching. As shown in Fig. 9, it provides 
percent level accuracy in halo clustering at high masses 
for the run with optimal parameters. At low masses where 
incompleteness is present, it has no effect and a 2 per cent 
disagreement persists. The mass function is correct by 
construction. 

Halo bias re-calibration. This method yields percent level 
agreement in halo bias at all mass bins and redshifts for the 
run using optimal parameters (see bottom row in Fig. 9). Us¬ 
ing a better halo bias prediction could improve the correction 
for the largest mass bin even further. The abundance is cor¬ 
rected at high masses but at low ones (M < 10^^'® h~^ Mq) 
it remains under-predicted. 

Using the halo-bias recalibration we find the monopole 
and the quadrupole have an excess of 1 and 2 per cent, re- 
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spectively, on large-scales. The hexadecapole also features 
an excess of power < 20% for k > 0.2 hMpc“'^ where signal 
to noise is higher, and deviates more for larger scales. Cor¬ 
recting down bulk velocities by 2 per cent provides a 1 per 
cent agreement at large scales {k ~ O.l/iMpc”^) for both 
the monopole and the quadrupole. 

At the final stage of writing this paper, Hewlett et al. 
(2015a) presented a similar analysis using the code L- 
PICOLA, which is an alternative parallel implementation of 
COLA. Our work has been independently developed from 
theirs and the scope of our analysis is rather different and 
complementary. In particular, they choose coarser and faster 
parameter conhgurations. For example they use the same 
number of mesh cells as particles or less (i.e., PMgrid < 1). 
This reduces substantially the memory and epu-time re¬ 
quirements, but we have shown that this introduces im¬ 
portant systematic effects in the halo mass and abundance. 
In addition, we also considered halo clustering both in real 
and redshift space. Koda et al. (2015) also presented their 
methodology for generating galaxy mock catalogues using 
COLA. They show results in both real and redshift space with 
PMgrid = 3, but they stick to the fiducial case of 10 time 
steps and do not explicitly explore the code parameter space. 

In this paper we have shown how an optimal choice of 
COLA code parameters, plus a minimal halo mass recalibra¬ 
tion, can yield clustering results to per cent agreement with 
respect to full A-body for all mass bins, scales and redshifts 
of interest for the new generation of galaxy surveys. 
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APPENDIX A: TRANSIENT EFFECTS 
CORRECTION IN MICE-GC 

The MICE-GC simulation used first order LPT initial condi¬ 
tions (Zeldovich approximation) at Zi = 100. This approxi¬ 
mation is known to yield transient effects in the distribution 
of matter and haloes mainly depending on the actual ini¬ 
tial redshift used (Scoccimarro 1998; Crocce et al. 2006). In 
this appendix we quantify these effect on the matter power 
spectrum and the halo mass function by using first or sec¬ 
ond order LPT for the concrete configuration of MICE-GC. 
These differences are then corrected in MICE-GC whenever 
we compare any measurement against those in COLA, because 
otherwise it would be a source of a systematic error. We es¬ 
timate them using additional GADGET-2 simulations using 
the same cosmology and starting redshift as in MICE-GC, 
see Sec. 2, and evolving 1024® particles using either first or 
second order LPT. The box size is 768 Mpc in order to 
keep the same mass resolution as in the reference MICE-GC 
A-body simulation. For the mass function we use as well a 
larger box size of 3072 h~^ Mpc in order to have good statis¬ 
tics at the high mass end. 

The upper panel of Fig. Al shows the transient effects 
in the matter power spectrum in real space for redshifts 0, 
0.5 and 1 in solid, dashed and dotted lines respectively. The 
correction is always below 2 per cent in the scales studied 
and remarkably similar to the results that found by (Schnei¬ 
der et al. 2015) using another simulation code (see their fig. 
2). The lower panel displays the mass function and uses the 
same line styles. The vertical line at M = 10®®'® Mq 
marks the matching mass-scale for the two runs used (the 
smaller one for smaller masses and the other way around). 
Remind that halo masses are defined using the Warren cor¬ 
rection and this enables a good overlapping of measurements 
at that matching mass-scale (in agreement with other tests 
for such correction, e.g. Crocce et al. 2010). In the mass 
function, differences are more important at high masses and 
redshifts, going up to 5 per cent at z = 1 for the mass range 
of interest and they are within 3% at 2 = 0. We measured as 
well the correction for the halo-matter cross power spectrum, 
but we found it to be always within the 1 per cent so that 
the effect is negligible. Thus, whenever we show a ratio of ei¬ 
ther mass functions or matter power spectra with respect to 
MICE-GC, we have multiplied it by the corresponding ratio 
shown in Fig. Al. In the case of halo clustering observables 
we find that transient effects are below the 1 per cent level, 
so we consider that we can neglect the correction for those 
measurements. 


APPENDIX B: PERFORMANCE OF COLA WITH 
RESPECT TO PARTICLE-MESH ONLY RUNS 

Tassev et al. (2013) showed that COLA simulations with as 
few as ten time steps recover the matter density field much 
better than just doing either a particle-mesh (PM) only sim¬ 
ulation with the same number of time steps or a 2LPT evo¬ 
lution. In this paper we advocate the use of more time steps 
in order to produce mock catalogues that are accurate in 
a large span of redshifts. After increasing the number of 
time-steps one might think that the 2LPT part of the COLA 
method has a very little contribution to the dynamics and 



Figure Al. Transient effects on the matter power spectrum (up¬ 
per panel) and the mass function (lower panel). Solid, dashed 
and dotted lines correspond to redshifts 0, 0.5 and 1 respec¬ 
tively. We show the ratio of the observables measured on a 
pair of identical full A-body runs differing only in the set up 
of the initial conditions: first order versus second order LPT. 
The simulations used the same particle mass as MICE-GC in 
a box of 768Mpc, except for the mass function plot that for 
M > 10^®'® Mq (marked by a vertical dashed line) used a 
larger box of 3072 h~^ Mpc. 

much of the information comes from the PM integration. 
In this appendix we show the relative impact of the 2LPT 
information when many time steps are used. 

For this exercise we use the FastPM®® parallel imple¬ 
mentation of COLA . We run several PM only and COLA runs 
with 768®particles in a box of 576 Mpc by side, and we 
vary the number of time steps for the PM only runs (the ini¬ 
tial redshift is fixed at Zi = 19). The green line in the upper 
panel in Fig. B1 shows that the PM only method recovers 
less power in the matter power spectrum than COLA for the 
same number of time steps. The deficit is larger at small 
scales and at high redshift (dashed and dotted lines corre¬ 
spond to redshifts 0.0 and 1.5). The plausible explanation 
is that the PM only has more difficulties to accurately inte¬ 
grate the equations of motions at high redshifts, when few 


®® https://github.com/rainwoodman/fastPM. 
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the halo linear bias and found a corresponding excess ex¬ 
hibiting the same trends. Both differences can be explained 
by a systematic underestimation of the halo masses for plain 
PM simulations that is mass and redshift dependent. Both 
panels of Fig. B1 show that, for a similar number of time- 
steps, differences between PM only and COLA decrease to¬ 
wards lower redshifts. As shown by the matter power spec¬ 
trum, discrepancies originate at high redshift, and late non¬ 
linear evolution masks them (in a similar fashion than tran¬ 
sient effects showed in Appendix A). We do not show in Fig. 
B1 full A^-body values since we want to focus only on the rel¬ 
ative effect of both methods. Also, it is clear that plain PM 
simulations converge to COLA in the limit of a large number 
of time steps. 

One might argue that this effect on the mass function 
could apparently solve the overestimation studied in Sec. 4.4. 
But this seems just a cancellation of errors that might intro¬ 
duce even more undesired systematic effects. We conclude 
that it is worth using the COLA method even with as many 
as 40 time steps. Less time-steps (e.g., 20 or fewer) still pro¬ 
duce accurate results for COLA, while plain PM simulations 
show non-negligible biases. 

We note that upon resubmission of this paper, Feng 
et al. (2016) presented FastPM and showed results that are 
consistent with our work. 


Figure Bl. Top panel: matter power spectra for three PM only 
simulations with 20, 40 and 80 time steps from bottom to top. The 
reference simulation is a COLA run with 40 time steps. Dotted and 
dashed lines correspond to redshifts 1.5 and 0 respectively. Bottom 
panel: mass function of a PM only run with respect to a COLA 
one, both with 40 time steps. Plain PM simulations introduce 
additional systematic effects, even with as much as 40 time steps. 

time steps sample each e-fold of the growth of structures^'^, 
and differences persist until z = 0. The PM method slowly 
converges to COLA at large scales by increasing the number 
of time steps. In turn, we recall that COLA reproduces the 
linear growth rate accurately regardless of the number of 
time steps (see Fig. 3). The lower panel in Fig. Bl displays 
the ratio of mass functions between the PM only and COLA 
runs with 40 time steps, for various redshifts. 

There is a clear underestimation that reaches 5 — 10% for 
both high redshifts and high masses. We have also studied 

Note that in those runs, time steps are linearly distributed 
with the scale factor. 
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